Decoupled Vector Architectures
نویسندگان
چکیده
1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Abstract The purpose of this paper is to show that using de-coupling techniques in a vector processor, the performance of vector programs can be greatly improved. Using a trace driven approach, we simulate a selection of the Perfect Club programs and compare their execution time on a conventional vector architecture and on a decoupled vector architecture. Decoupling provides a performance advantage of more than a factor of two for realistic memory latencies, and even with an ideal memory system with no latency, there is still a speedup of as much as 50%. A bypassing technique between the load/store queues is introduced and we show how it can give up to an extra speedup of 22% while also reducing total memory traac by an average of 20%. An important part of this paper is devoted to study the tradeoos involved in choosing an adequate size for the diierent queues of the architecture, so that the hardware cost of the queues can be minimized while still retaining most of the performance advantages of decoupling.
منابع مشابه
Decoupled Vector Architectures: a Rst Look
The purpose of this paper is to show that using decoupling techniques in a vector processor, the performance of vector programs can be greatly improved. We will show how, even for an ideal memory system with no latency, decoupling provides a signiicant advantage over standard mode of operation. We will also present data showing that for more realistic latencies, decoupled vector architectures p...
متن کاملEffective usage of vector registers in decoupled vector architectures
Thz.spaptr presemts a study of the tmpact ofreduclng the vector regtsterstze m a decoupled vector architecture. In traditional in-order vector architectures, loltqvectorr egzstersh avetypically been the norm. We start presenting data that shows that, even for highly ucctorz.~able codes, only a small, fraction ojall elements of a long vector regzster are actually used. Lfre also show that reduct...
متن کاملMemory Decoupled Architectures and related issues Guest Editor’s Introduction
It is my great pleasure to serve as guest editor for this special issue of TCCA Newsletter, which is hosting eight papers from the MEDEA (MEmory DEcoupled Architectures) Workshop, jointly held with PACT-2000 conference. The rationale behind this workshop was to revive the original idea of Memory Access Decoupling, presented in the famous paper of Jim Smith, “Decoupled Access/Execute Architectur...
متن کاملDecoupled Architectures for Complexity-Effective General Purpose Processors
Decoupled architectures have previously been investigated in the context of high performance scientific computing. For general purpose computing, however, superscalar processors have proven to be flexible in providing high performance across a wide range of applications. To achieve this goal, these architectures have incorporated enormous amounts of complexity to obtain modest performance impro...
متن کاملDecoupling Encoder and Decoder Networks for Abstractive Document Summarization
Abstractive document summarization seeks to automatically generate a summary for a document, based on some abstract “understanding” of the original document. State-of-the-art techniques traditionally use attentive encoder–decoder architectures. However, due to the large number of parameters in these models, they require large training datasets and long training times. In this paper, we propose ...
متن کاملPerformance of the decoupled ACRI-1 architecture: the perfect club
This paper examines the performance potential of decoupled computer architectures on real-world codes, and includes the rst performance bounds calculations to be published for the highly-decoupled ACRI-1 computer architecture. It also constitutes the rst published work to report on the eeectiveness of a decoupling Fortran90 compiler. Decoupling is an architectural optimisation which ooers very ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996